43 research outputs found

    Genome Halving by Block Interchange

    Get PDF
    We address the problem of finding the minimal number of block interchanges (exchange of two intervals) required to transform a duplicated linear genome into a tandem duplicated linear genome. We provide a formula for the distance as well as a polynomial time algorithm for the sorting problem

    Parallel Position Weight Matrices Algorithms

    Get PDF
    International audiencePosition Weight Matrices (PWMs) are broadly used in computational biology. The basic problems, Scan and MultipleScan, aim to find all the occurrences of a given PWM or a set of PWMs in long sequences. Some other PWM tasks share a common NP-hard subproblem, ScoreDistribution. The existing algorithms rely on the enumeration on a large set of scores or words, and they are mostly not suitable for parallelization. We propose a new algorithm, BucketScoreDistribution, that is both very efficient and suitable for parallelization. We bound the error induced by this algorithm. We realized a GPU prototype for Scan, MultipleScan and BucketScoreDistribution with the CUDA libraries, and report for the different problems speedups larger than 10× on several Nvidia cards

    Genome Halving by Block Interchange

    No full text
    International audienceWe address the problem of finding the minimal number of block interchanges (exchange of two intervals) required to transform a duplicated linear genome into a tandem duplicated linear genome. We provide a formula for the distance as well as a polynomial time algorithm for the sorting problem

    Manycore high-performance computing in bioinformatics

    Get PDF
    Mining the increasing amount of genomic data requires having very efficient tools. Increasing the efficiency can be obtained with better algorithms, but one could also take advantage of the hardware itself to reduce the application runtimes. Since a few years, issues with heat dissipation prevent the processors from having higher frequencies. One of the answers to maintain Moore's Law is parallel processing. Grid environments provide tools for effective implementation of coarse grain parallelization. Recently, another kind of hardware has attracted interest: multicore processors. Graphic processing units (GPUs) are a first step towards massively multicore processors. They allow everyone to have some teraflops of cheap computing power in its personal computer. The CUDA library (released in 2007) and the new standard OpenCL (specified in 2008) make programming of such devices very convenient. OpenCL is likely to gain a wide industrial support and to become a standard of choice for parallel programming. In all cases, the best speedups are obtained when combining precise algorithmic studies with a knowledge of the computing architectures. This is especially true with the memory hierarchy: the algorithms have to find a good balance between using large (and slow) global memories and some fast (but small) local memories. In this chapter, we will show how those manycore devices enable more efficient bioinformatics applications. We will first give some insights into architectures and parallelism. Then we will describe recent implementations specifically designed for manycore architectures, including algorithms on sequence alignment and RNA structure prediction. We will conclude with some thoughts about the dissemination of those algorithms and implementations: are they today available on the bookshelf for everyone

    A scenario of mitochondrial genome evolution in maize based on rearrangement events

    Get PDF
    Background: Despite their monophyletic origin, animal and plant mitochondrial genomes have been described as exhibiting different modes of evolution. Indeed, plant mitochondrial genomes feature a larger size, a lower mutation rate and more rearrangements than their animal counterparts. Gene order variation in animal mitochondrial genomes is often described as being due to translocation and inversion events, but tandem duplication followed by loss has also been proposed as an alternative process. In plant mitochondrial genomes, at the species level, gene shuffling and duplicate occurrence are such that no clear phylogeny has ever been identified, when considering genome structure variation. Results: In this study we analyzed the whole sequences of eight mitochondrial genomes from maize and teosintes in order to comprehend the events that led to their structural features, i.e. the order of genes, tRNAs, rRNAs, ORFs, pseudogenes and non-coding sequences shared by all mitogenomes and duplicate occurrences. We suggest a tandem duplication model similar to the one described in animals, except that some duplicates can remain. Thi

    Genome dedoubling by DCJ and reversal

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Segmental duplications in genomes have been studied for many years. Recently, several studies have highlighted a biological phenomenon called <it>breakpoint-duplication</it> that apparently associates a significant proportion of segmental duplications in Mammals, and the Drosophila species group, to breakpoints in rearrangement events.</p> <p>Results</p> <p>In this paper, we introduce and study a combinatorial problem, inspired from the breakpoint-duplication phenomenon, called the <it>Genome Dedoubling Problem.</it> It consists of finding a minimum length rearrangement scenario required to transform a genome with duplicated segments into a non-duplicated genome such that duplications are caused by rearrangement breakpoints. We show that the problem, in the Double-Cut-and-Join (DCJ) and the reversal rearrangement models, can be reduced to an APX-complete problem, and we provide algorithms for the Genome Dedoubling Problem with 2-approximable parts. We apply the methods for the reconstruction of a non-duplicated ancestor of <it>Drosophila yakuba.</it></p> <p>Conclusions</p> <p>We present the <it>Genome Dedoubling Problem</it>, and describe two algorithms solving the problem in the DCJ model, and the reversal model. The usefulness of the problems and the methods are showed through an application to real Drosophila data.</p

    ProCARs: Progressive Reconstruction of Ancestral Gene Orders

    Get PDF
    International audienceBackground: In the context of ancestral gene order reconstruction from extant genomes, there exist two main computational approaches: rearrangement-based, and homology-based methods. The rearrangement-based methods consist in minimizing a total rearrangement distance on the branches of a species tree. The homology-based methods consist in the detection of a set of potential ancestral contiguity features, followed by the assembling of these features into Contiguous Ancestral Regions (CARs). Results: In this paper, we present a new homology-based method that uses a progressive approach for both the detection and the assembling of ancestral contiguity features into CARs. The method is based on detecting a set of potential ancestral adjacencies iteratively using the current set of CARs at each step, and constructing CARs progressively using a 2-phase assembling method. We show the usefulness of the method through a reconstruction of the boreoeutherian ancestral gene order, and a comparison with three other homology-based methods: AnGeS, InferCARs and GapAdj. The program is written in Python, and the dataset used in this paper are available at http://bioinfo.lifl.fr/procars/

    Debugging long-read genome and metagenome assemblies using string graph analysis

    Get PDF
    National audienceThird-generation long-read sequencing technologies tackle the repeat problem in genome assembly by producing reads that are long enough to span most repeat instances. In principle one expects that with such reads most bacterial genomes will be assembled into a single contig. However in practice, some datasets fail to be perfectly assembled even with leading assemblers, and are fragmented into a handful of contigs. As a mean to investigate those cases, we consider the string graphs that are generated by assemblers during intermediate stages of the assembly process. We seek to establish a coherent framework for analyzing these graphs, in the hope that they will help us determine the biological causes that led the assembler to output shorter contigs. This poster presents some preliminary results of such an analysis

    Biomanycores, open-source parallel code for many-core bioinformatics

    Get PDF
    International audienceBiomanycores is a collection of bioinformatics tools, designed to bridge the gap between researches in OpenCL/CUDA high-performance computing on GPU and other "manycore processors" and usual bioinformaticians and biologists

    Biomanycores, open-source parallel code for many-core bioinformatics

    Get PDF
    International audienceBiomanycores is a collection of bioinformatics tools, designed to bridge the gap between researches in OpenCL/CUDA high-performance computing on GPU and other "manycore processors" and usual bioinformaticians and biologists
    corecore